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Abstract 
Analogical reasoning is a fundamental cognitive skill of drawing relationships between 
representations, often between prior knowledge and new representations, that allows for 
bootstrapping cognitive and language development (Gentner, 2003). Analogical reasoning 
proficiency develops substantially during childhood, though the mechanisms underlying this 
development have been debated, with developing cognitive resources as one proposed 
mechanism (Richland, Morrison & Holyoak, 2006). We explore the role of executive function 
(EF) in supporting children’s analogical reasoning development, with the goal of determining 
whether predicted aspects of EF were related to analogical development at the level of individual 
differences. We assessed 5- to 11-year-old children’s working memory, inhibitory control, and 
cognitive flexibility using measures from the NIH Toolbox Cognition battery (2013). Individual 
differences in children’s working memory best predicted performance on an analogical mapping 
task. These findings underscore the need to consider fundamental cognitive capacities in 


comprehensive theories of children’s reasoning development. 
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Analogical reasoning — the cognitive process of drawing relationships between 
representations, often between prior knowledge and new representations — is a fundamental skill 
that develops dramatically in proficiency and resistance to distraction during childhood. Analogy 
plays a central role in higher-level cognition, and its ubiquitous and wide-ranging influence 
makes its developmental underpinnings essential to understanding human cognition more 
generally (Gentner, 2003; Hofstadter & Sander, 2013). Practically, analogical thinking is an 
important tool for learning. It enables children to, for example, extend existing knowledge to 
new contexts, even if the representational systems look different. For instance, a child might use 
what they know about human energy needs (e.g., people need to eat for energy) to plants (e.g., 
plants likewise need energy input), though humans and plants differ in many respects. Building 
analogical reasoning skills is also a key objective for educational contexts, where children must 
build the skills to scaffold their own knowledge, to transfer it to new contexts, to explain new 
information, and to solve new problems, (Goldwater & Schalk, 2016; Richland & Simms, 2015). 
Understanding the mechanisms underlying analogical reasoning and development, therefore, is 
vital to identifying and intervening on points of dysfunction. In this paper, we explore how one 
particular factor, executive function, underlies the development of children’s analogical 
reasoning. 

Analogical Reasoning and Development 

Formally, analogies are driven by alignment between systems of relations. Two 
situations are analogous if they share relational similarities, regardless of other superficial 
properties or similarities. For example, plant stems are like drinking straws because they share 
functional and mechanistic relationships; both deliver liquid nourishment to a living organism, 


and both use differential pressure to move the liquid along the shaft. A child who understands 


how drinking straws work may be able to apply this knowledge to help them understand the less 
familiar domain of plant stems. 

Performing analogical reasoning is not trivial. Assuming a reasoner has recognized an 
opportunity for aligning the relationships in two or more analogs (e.g., the relationships between 
food and humans, and sunlight and plants) — no small feat in itself (Gick & Holyoak, 1980; 
Loewenstein, Thompson, & Gentner, 1999) — they must first encode the relational information 
from both analogues. These relational structures must be mentally maintained and manipulated 
to find correspondences between them (e.g., between the energy produced when a person 
metabolizes food and the energy produced when a plant photosynthesizes sunlight). If 
worthwhile correspondences are not initially found, the analogy must be discarded in favor of 
another, or the representations must be flexibly modified to enable a better alignment (Kurtz, 
2005; Yan, Forbus, & Gentner, 2003). For example, children may not initially see how food 
corresponds to sunlight, because food is eaten whereas sunlight is absorbed. However, this 
analogy becomes clear when children understand that both eating food and absorbing sunlight 
are intake processes. And all of this must take place while suppressing attention to irrelevant or 
extraneous information (Krawczyk et al., 2008). 

This research explores whether Executive Function (EF) resources can explain patterns of 
analogical reasoning for children between the ages of 5 and 11, given analogy’s high cognitive 
demands as described here. In particular, children's analogical reasoning improves along at least 
two key dimensions: the ability to resist perceptual distraction to prioritize relational 
information, and the ability to handle and manipulate increasingly complex representations 
during alignment and mapping. We first describe this developmental trajectory, and then explain 


the rationale for EF as an explanatory mechanism. 


Object Focus and Relational Shift 

When children are given opportunities to engage in analogical reasoning, they often 
prioritize salient but non-relational information over relational information (e.g., Daehler & 
Chen, 1993; Rattermann & Gentner, 1998; Thibaut et al., 2010a). In particular, young children 
tend to rely on perceptually-similar objects, which can detract from analogical reasoning 
performance if those object matches compete with relational matches (Christie & Gentner, 2010; 
Richland, Morrison, & Holyoak, 2006). 

However, during development, children’s analogical reasoning skills grow considerably. 
They become increasingly oriented toward relations, a pattern described as the relational shift 
(Gentner, 1988; Gentner & Rattermann, 1991). Their early focus on holistic and object similarity 
gives way to greater appreciation for simple relational similarity, and eventually more complex, 
interconnected relational structures (e.g., Gentner & Toupin, 1986; Chen, 1996; Loewenstein & 
Gentner, 2005; Richland et al., 2006; Thibaut et al., 2010b). 
Integrating Multiple Relations 

Children also become increasingly adept at engaging with and integrating multiple 
relational structures as they develop. In Richland and colleagues’ (2006) scene analogy task, 
children were asked to align and map relations across two scenes. Some pairs of scenes 
contained only a single event relation (e.g., woman feeding boy, man feeding bird). Others 
depicted a more complex event scene consisting of two linked relations (e.g., woman feeding 
boy feeding dog, a man feeding bird feeding hatchlings). Across two studies, younger children 
were less accurate on complex, 2-relation trials than the simpler, 1-relation trials. For older 
children, the effect of scene complexity was absent or diminished, suggesting that older children 


were less impaired by multiple relations than the younger children. 


Thus, during childhood, children’s analogical reasoning improves as they shift their 
attention from objects to relations and become better at engaging with larger and more complex 
structures. 

Executive Function 
Development of Executive Function 

Similar to analogical reasoning, executive function also develops significantly during 
childhood. Executive function (EF) refers to the coordination of attention and action to carry out 
intentional, goal-directed behavior (for reviews, see Carlson, Zelazo, & Faja, 2013; Diamond, 
2013). Over development, children become better able to regulate their behavior under varying 
circumstances. The processes supporting coordinated, goal-directed behavior become 
increasingly differentiated with age, and three separable but interrelated component functions 
emerge: working memory, inhibitory control, and cognitive flexibility (or shifting) (Miyake et al., 
2000; Wiebe et al., 2011). 

Working memory (WM) is the store of information that is active and consciously 
available at a given time. The amount of information that can be actively held and — importantly 
— manipulated by an individual is their WM capacity, which is limited (Baddeley, 2012; Cowan, 
2010). WM capacity increases as children mature, allowing them to hold and manipulate more 
information, tackle tasks with greater representational demands, and appear more adult-like 
(Crone et al., 2006; Gathercole, Pickering, Ambridge, & Wearing, 2004). 

Inhibitory Control (IC) refers to the ability to suppress attention and action to irrelevant 
or conflicting information, especially when such responses are prepotent. With age, children’s 


ability to inhibit attention and action improves, and they are better able to focus on relevant 


information and resolve conflict in service of task goals (Davidson et al., 2006; Gerstadt, Hong, 
& Diamond, 1994; Rueda et al., 2004). 

Cognitive flexibility (CF) is the ability to adaptively switch tasks, broadly construed. This 
includes being able to think about something in multiple ways, efficiently switch goals or 
activities, or take multiple perspectives. Like working memory and inhibitory control, cognitive 
flexibility improves as children develop, enabling children to successfully negotiate more 
complex tasks with shifting attentional demands or rule sets (Cepeda et al., 2001; Zelazo, Frye, 
& Rapus, 1996). 

Though separable, WM, IC, and CF work together to support non-automated, goal- 
directed behavior. 

Executive Function and Analogy 

Reflecting the two ways analogical reasoning improves over development — in integrating 
multiple relations, and in shifting attention from objects to relations — gains in working memory 
and inhibitory control resources, specifically, have been theorized to support analogical 
development. 

Working Memory. Analogies are highly demanding of working memory, because (1) 
relational information is inherently more complex, and therefore more costly to represent, than 
object or featural information (Andrews & Halford, 2002; Halford, Wilson, & Phillips, 1998); 
(2) multiple relational representations must be held in working memory simultaneously; and (3) 
these representations must be manipulated to, for example, integrate multiple relations into a 
systematic structure or adjust the representations to match across analogues. 

Evidence from work with both children and adults implicate working memory in 


analogical reasoning. When the representational demands of analogy tasks are increased, for 


instance by increasing the size of the relational structures, children show decrements in 
reasoning, which diminish with age (e.g., Richland et al., 2006). In healthy adults, occupying 
working memory resources impairs integration of multiple relations (Waltz et al., 2000). Adults 
with damage to brain areas associated with working memory similarly show deficits in 
processing complex relations (Waltz et al., 1999). And like children, increasing representational 
complexity during analogical reasoning decreases accuracy (Cho, Holyoak, & Cannon, 2007; 
Viskontas et al., 2004). 

Inhibitory Control. Analogical reasoning also imposes demands on inhibitory control 
processes to override attention to salient object properties or irrelevant perceptual similarities in 
favor of relational correspondences. This may be especially true for young children, who have a 
strong tendency to orient toward object and perceptual similarity (Gentner & Rattermann, 1991). 

Indeed, introducing conflicting object similarity to an analogical reasoning task hurts 
children’s reasoning, but they become less susceptible to distraction with age (e.g., Loewenstein 
& Gentner, 2005; Richland et al., 2006). Computational models have also shown that increases 
in model inhibition, along with relational knowledge, successfully recreate patterns of analogical 
development (Morrison, Doumas, & Richland, 2011). In adults, damage to brain regions 
associated with inhibitory control is associated with failure to prioritize relational similarity over 
featural similarity (Krawczyk et al., 2008; Morrison et al., 2004), much as young children do. 

Present Study 

The investigation of executive resources in analogical development has been primarily 
investigated through manipulation of task demands (e.g., Richland et al., 2006; Thibaut et al., 
2010a) or through longitudinal prediction designs (Richland & Burchinal, 2013), but no work 


has yet clarified that individual differences in children's EF capacities indeed correspond to their 


analogical reasoning skills. Thus, it remains possible that variations in performance previously 
attributed to EF could instead be explained task difficulty or other age-linked factors. 

Here, we assessed the EF capacities — including working memory, inhibitory control, and 
cognitive flexibility — of children across a wide age range (5- to 1 1-years-old) and used 
performance on these measures to predict performance on Richland and colleagues’ (2006) scene 
analogy mapping task. This task was selected because (1) it has been successfully used with, and 
shows variation for, children as young as 3 and as old as 14 years, (2) it strategically manipulates 
two factors that should impose distinct demands on children’s EF resources — the presence of an 
object match distractor and the number of relations that must be integrated to find an alignment — 
and (3) it uses relations that are familiar to children in our age range, ensuring that variations in 
relational knowledge could not explain changes in performance across development on these 
tasks (e.g., Gentner & Rattermann, 1991). 

We hypothesized that individual differences in children’s EF capacities will predict 
children’s analogy performance. In our account, EF — along with other factors that change with 
age like domain knowledge, relational language, attentional biases, strategy use, and so on — 
interact to support changes in analogical reasoning over development. Though not our central 
hypothesis, our design allows us to further explore whether working memory and inhibitory 
control are differentially related to distinct aspects of analogical reasoning behavior, with 
working memory predicting performance involving relational integration and inhibitory control 
predicting performance involving featural distraction. 

This hypothesis stands in contrast to an alternative in which age — but not EF — is related 
to improvements in children’s analogical reasoning. In this alternative account, improvements in 


analogical reasoning over development are a consequence of these other age-related changes 


alone, and after controlling for age, EF should not be related to analogical performance. 
Although work with adults has demonstrated a relationship between executive resources and 
analogical performance (e.g., Morrison, et al., 2004; Waltz et al., 2000), these are extreme cases 
of clinical dysfunction or artificial manipulation that may not reflect the typical relationship 
between maturational improvements in EF and analogical reasoning over development. 
Method 

Participants 

A total of 67 5- to 11-year-old children were recruited to participate in this study from the 
child participant subject pool at the university. This age range was selected because prior work 
shows age-related changes in patterns of analogical reasoning performance on our outcome 
measure of interest (the Scene Analogy task, described below) within this range, and because this 
age group could complete multiple tasks in a single session. Two children did not complete the 
outcome measure of interest. One child scored excessively low on this task, less than three 
standard deviations from the mean, and was excluded from all further analyses. The remaining 
64 children ranged in age from 5.0 to 11.4 years (Mage = 7.25 years, SDage= 1.56 years). This 
sample consisted of 36 boys and 28 girls, from a variety of racial/ethnic backgrounds (39% 
White, 31% Black, 30% Other). Mothers’ education levels also varied, though the majority had 
completed college or beyond (41% graduate/professional degree, 37% Bachelor’s degree, 19% 
Associate’s degree, 3% high school/GED or missing). 
Materials and Procedure 

Data were collected as part of a larger study exploring child and caregiver interactions. 
The outcome of interest was children’s analogical reasoning skill, as measured by the Scene 


Analogy task (Richland et al., 2006). Children also completed three measures of executive 


10 


function and working memory from the cognition battery of the NIH Toolbox (Gershon, 
Wagster, Hendrie, Fox, Cook, & Nowinski, 2013): the Dimensional Change Card Sort Test, a 
measure of cognitive flexibility; the Flanker Inhibitory Control and Attention Test; and the List 
Sorting Working Memory Test. These measures were used to predict performance on the Scene 
Analogy task. 

The executive function (EF) measures were administered in a single session that also 
included tasks engaging children and their caregivers in independent and joint description of 
select picture pairs from the Scene Analogy task. Performance on the parent and child scene 
description tasks is beyond the scope of this paper and is not discussed further. All children 
completed the tasks in a fixed order that interspersed the EF measures and description tasks and 
ended with the Scene Analogy task. 

Scene Analogy task. Children’s analogical reasoning skill was measured using the Scene 
Analogy task, an analogical mapping task (Richland et al., 2006; available from Lindsey 
Richland on request). In this task, children are presented with 20 pairs of pictures depicting 
analogous event scenes (Figure 1). The child’s task is to identify the object in the target picture 
that plays the same role in the events as an object identified in the source picture (the relational 
match). To do so requires that children identify, align, and prioritize the relations in the scenes. 

Picture pairs varied along two dimensions: the presence of an irrelevant and potentially 
distracting object match in the target (Distractor versus No Distractor problems), and the number 
of relations to be mapped across pictures (1-relation versus 2-relation problems). Distractor 
problems were designed to assess children’s ability to suppress attention and responses to object 
similarity in favor of relational similarity. Failure to do so should result in featural errors, 


wherein the child selects the object match instead of the relational match. Two-relation problems 
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were designed to assess children’s ability to integrate multiple relations and align larger 
relational structures. On these, and all, trials, noticing the relation but failing to properly align the 
corresponding roles should result in relational errors, wherein the child selects a participant in 
the event in a different role (Figure 1, d). 

The task was administered to the participants in paper form individually by an 
experimenter. Participants received instructions and two sample problems with feedback to 
ensure that they understood the goal of the task. While presenting the first, 1-relation sample 
problem, the experimenter explained: “There is a certain pattern in the top picture, and the same 
pattern happens in the bottom picture. [...] First I will tell you what pattern is happening in the 
top picture. Then I am going to put a sticker on one thing in the top picture, and your job is to tell 
me what is in the same part of the pattern in the bottom picture, so I can put a sticker on that 
too.” The events in both pictures were described for the child and used to illustrate the task. Only 
one child failed to select the correct match on the first sample picture, and they were corrected 
and heard the correspondences between pictures described again. 

The second sample problem introduced the 2-relation problems. While presenting the 
second sample, the experimenter explained: “Now sometimes the pattern will have two parts, 
like the one you just saw [...] and sometimes the pattern will have three parts [referring to the 
number of participants in the event]. Let me show you what I mean.” Again, the experimenter 
described the events in both pictures to illustrate. All participants correctly selected the relational 
match on the second sample problem. 

Next, children solved 20 test problems. For each problem, the experimenter first 
described the event and identified the key object in the source picture and then asked the child to 


find the corresponding object from the target picture. For example, “In the top picture, there is a 
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woman feeding a boy. If I put a sticker on the boy in the top picture, where should I put a sticker 
in the bottom picture?” 

The 20 problems consisted of five of each of the four types of problems generated from 
the 2 x 2 of Distractor and Complexity, and each participant saw only one version of each event. 
Trials were administered in five semi-randomized orders, and the version of each event pair was 
counterbalanced across orders. Children had seen some of these pictures before in a description 
task with their parent; some pictures they saw were identical to the test problem, and some were 
different versions of the event to be mapped. Regardless, the pattern of responding when 
including all 20 problems did not differ qualitatively overall from the pattern found when 
previously seen events were excluded, so all analyses of the Scene Analogy task include all 20 
problems. 

NIH Toolbox tasks. Three aspects of children’s executive functions were measured 
using three tasks from the NIH Toolbox (Gershon, et al., 2013): the Dimensional Change Card 
Sort Test (DCCS), a measure of cognitive flexibility and task switching; the Flanker Inhibitory 
Control and Attention Test (Flanker); and the List Sorting Working Memory Test (List Sort). 
The NIH Toolbox is a validated and normed computerized battery to assess cognitive, emotional, 
motor, and sensory function. Tasks can be administered from age 3 to 85 and take only a few 
minutes each. 

NIH-TB DCCS (cognitive flexibility). This task is a measure of children’s ability to 
flexibly switch between tasks. In this test, children must match a test picture (e.g., a yellow ball) 
to one of two target pictures, one of which matches the test picture in color (e.g., a yellow truck) 
and the other in shape (e.g., a blue ball). Participants first sort the test pictures along one 


dimension, and then they must switch sorting rules and sort along the other dimension. For 
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example, they may be asked to match the test and target pictures by shape for four trials, and 
then to match by color for one trial, and then resume matching by shape. The program tells 
participants which dimension to sort by at the start of each trial with written and verbal 
dimension labels (e.g., “COLOR”’). Task administration took approximately four minutes. 

The DCCS is scored by combining two scoring vectors: accuracy and reaction time. The 
score for each vector ranges from 0 to 5 points, and thus, the full computed score ranges from 0 
to 10. Accuracy is always considered first; if accuracy does not exceed 80%, only the accuracy 
vector is reported (i.e., the score will be less than 5, with chance performance at 2.5). If accuracy 
is greater than 80%, the reaction time and accuracy vectors are combined into the computed 
score. 

Test-retest reliability for the DCCS task is high for 3- to 15-year-olds, ICC = 0.92 (95% 
CI .86, .95), and the measure correlates highly with measures of the same construct for both 
younger (3- to 6-year-olds: r = .69, p < .0001) and older children (8- to 15-year-olds: r = .64, p < 
.0001), demonstrating good convergent construct validity (Zelazo, Anderson, Richler, Wallner- 
Allen, Beaumont, & Weintraub, 2013). 

NIH-TB Flanker (attention and inhibitory control). This test is a version of the Eriksen 
flanker task, derived from the Attention Network Task (Rueda et al., 2004). This task tests the 
ability to inhibit attention and behavioral responses to irrelevant features. On each trial, 
participants see a horizontal line made up of five fish (for children younger than 8) or five arrows 
(for participants 8 or older). Participants are instructed to push a button corresponding to the 
direction of the central object (e.g., push the left arrow key if the center fish is pointing left), 


while ignoring the direction of the flanking objects. On congruent trials, all the fish/arrows point 
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the same way; on incongruent trials, the middle fish/arrow points the opposite direction of the 
fish/arrows on either side. Task administration took approximately two to four minutes. 

For children younger than eight, if accuracy on the 20 fish trials meets or exceeds 90%, 
they are also given 20 arrow trials. Older children and adults complete only the 20 arrow trials, 
and their scores are computed as though they were 100% accurate on the fish trials. These age 
groups typically perform at ceiling on the fish trials. 

As in the DCCS, the Flanker uses a 2-vector scoring method, which combines accuracy 
and reaction time. The score for each vector ranges from 0 to 5 points, and computed scores 
range from 0 to 10. Reaction time is only considered if accuracy exceeds 80%; otherwise, only 
the accuracy vector is included and the computed score will be less than 5 (with chance 
performance at 2.5). 

Test-retest reliability for the Flanker task is high for 3- to 15-year-olds, ICC = 0.92 (95% 
CI .86, .95), and the measure correlates highly (for 3- to 6-year-olds: r = .60, p < .0001) and 
moderately (for 8- to 15-year-olds: r = .34, p = .002) with measures of the same construct, 
demonstrating good convergent construct validity (Zelazo et al., 2013). 

NIH-TB List Sort (working memory). This task is adapted from Mungas' List Sorting 
task from the Spanish and English Neuropsychological Assessment Scales (Mungas, Reed, 
Marshall, & Gonzalez, 2000). After a brief training (for children aged seven and older — a more 
elaborate training is used for children aged three to six to ensure task comprehension), children 
hear a list of either animals or fruit in a consistent random order. Each word is accompanied by a 
picture of the named object, which varies in size relative to the other objects in the list. 

Participants are directed to report the items in size order, from smallest to largest. Once 


participants reach threshold with single-category sorting (by getting two wrong of a given list 
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size), children then see and hear lists of both fruit and animals intermixed, and are directed to 
report first the fruit items in size order, from smallest to largest, followed by the animals, from 
smallest to largest. When children reach threshold on the two-category sorting, the task ends. 
The approximate time for administration was 7 minutes. 

The List Sort task is scored by summing the total number of items correctly recalled, 
which can range from 0-26. 

Test-retest reliability for the List Sort task is high in this age group (3-15 years), JCC = 
0.86 (95% CI .78, .91). For both younger children (3-6 years) and older children (8-15 years), the 
measure correlates with other measures of the construct, rs = .57, all ps < .0001, suggesting the 
measure has good convergent construct validity in this age range (Tulsky, Carlozzi, Chevalier, 


Espy, Beaumont, & Mungas, 2013). 
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a. 1-relation, No distractor 


b. 1-relation, Distractor 
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Figure 1. Examples of the four types of Scene Analogy problems, which varied scene complexity 
(a, b: one relation; c, d: two relations) and the presence of a competing object match distractor (a, 
c: no distractor; b, d: distractor). Children were asked to find the object in the target/bottom 
picture that corresponded with the object of interest in the base/top picture (identified here with 
an arrow). The coding of possible responses in the target — correct relational match, featural 


error, relational errors — are labeled in (d). 


Results 

Before analyzing the relationship between EF capacities and analogical reasoning 
performance, we first summarize performance on each set of tasks. 
Analogical Reasoning (Scene Analogy) 

Analyses were conducted to confirm that our participants replicated the patterns found in 
Richland et al. (2006). Statistical tests were conducted using all participants with usable scene 
analogy data, who were grouped by age into three categories: 5- to 6-year-olds (n = 25), 7- to 8- 
year-olds (n = 29), and 9- to 11-year-olds (n = 10). We elected to treat age categorically in our 
analysis in order to facilitate direct comparison with the previous work. However, it is important 
to note that these groupings result in low numbers for the 9- to 11-year-olds, potentially limiting 
the power to detect patterns of performance in this group. 

As shown next, accuracy and error data replicated previous findings (Richland et al., 
2006), which demonstrated that analogical skill increased with age. In particular our results 
mirror those of Richland and colleagues in that: (1) over time children became less susceptible to 
featural distraction — accuracy improved with age on problems with distracting object matches, 
with the greatest improvements between the youngest and middle age groups; and (2) over 


development children improved in their ability to create internally consistent mappings — 
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accuracy on problems with greater complexity improved with age, with the greatest 
improvements between the middle and oldest age groups. 

Accuracy. To explore the effects of object distractors and relational complexity on 
children’s relational matching performance, the proportion of children’s correct relational 
responses for each type of trial were entered into a mixed-measures ANOVA, with Age category 
as a between-subjects variable, and Distractor and Complexity as within-subjects variables 
(Figure 2, Table 1). In addition to main effects for Age, F(2,61) = 12.78, p < .001, mp2 = .295; 
Distractor, F(1,61) = 5.83, p < .05, yp2 = .087; and Complexity, F(1,61) = 26.96, p < .001, mp2 = 
.307, the interaction between Age and Distractor was significant, F(2,61) = 4.55, p < .05, yp2 = 
.130. No other interactions were significant. 

To further explore the Age x Distractor interaction from the main analysis, as well as 
more precisely characterize age-related differences in patterns of accuracy on the task, a2 x 2 
within-subjects ANOVA with Distractor and Complexity was calculated for each age group 
separately. The 5- to 6-year-olds showed a main effect of Distractor, F(1,24) = 19.03, p < .001, 
np2 = .442 and of Complexity, F(1,24) = 21.05, p < .001, mp2 = .467. They were more accurate on 
No Distractor than Distractor problems and more accurate on |-relation than 2-relation problems. 
For 7- to 8-year-olds, the analysis yielded a main effect of Complexity, F(1,28) = 16.92, p < 
001, 47p2 = .377. They were more accurate on 1|-relation than 2-relation problems, but showed no 
main effect of Distractor. Finally, the 9- to 11-year-old group did not have any significant main 
effects or interactions. 

The oldest group may not have had sufficient numbers to detect differences across 
problem types. However, their consistently high performance also suggested the possibility of 


ceiling performance. Comparisons against perfect accuracy for each problem type revealed that 
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this group’s accuracy did differ significantly from ceiling on 2-relation problems (No Distractor: 
(9) = -2.69, p < .05; Distractor: (9) = -3.00, p < .05), but not on 1-relation problems (No 
Distractor: t(9) = -1.49, p = .17; Distractor: (9) = -1.41, p =.19). 

Featural errors. To examine children’s susceptibility to competing object matches, the 
proportion of featural errors on each problem type was entered into a 3(Age) x 2(Distractor) x 
2(Complexity) mixed measures ANOVA (Table 1). Note that a featural error — selecting the 
object distractor — was not possible on the No Distractor trials (which means we would expect 
more featural errors on Distractor problems). 

In addition to main effects of Distractor, F(1,61) = 30.48, p < .001, yp2 = .333, and Age, 
F(2,61) = 7.27, p < .01, yp2 = .193, the Age x Distractor interaction was significant, F(2,61) = 
7.27, p < .O1, yp2 = .193. 

To further explore the Age x Distractor interaction, a 2 x 2 within-subjects ANOVA with 
Distractor and Complexity was calculated for each group separately. A main effect of Distractor 
was found for the 5- to 6-year-olds, F(1,24) = 38.78, p < .001, yp2 = .618, and the 7- to 8-year- 
olds, F(1,28) = 13.39, p < .O1, yp2 = .324, but not the 9- to 11-year-olds, F(1,9) = 2.25, p = .17, 
Np2 = .200 The two younger groups made more featural errors on the Distractor than No 
Distractor problems, but this difference was not significant for the oldest group. No other main 
effects or interactions were significant. 

Relational errors. The proportion of children’s relational errors on each problem type 
was entered into a 3(Age) x 2(Distractor) x 2(Complexity) mixed measures ANOVA (Table 1). 
Note that there was one possible relational error on the 1-relation problems, and two possible 
relational errors on the 2-relation problems (which means we might expect more relational errors 


on 2-relation problems). 
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The analysis yielded significant main effects of Complexity, F(1,61) = 23.46, p < .001, 
np2 = .278, and Age, F(2,61) = 11.53, p < .OO1, yp2 = .274, and a marginally significant Age x 
Distractor interaction, F(2,61) = 2.78, p < .10, np2 = .084. (Post-hoc comparisons suggested this 
unexpected interaction was driven by the 5- to 6-year-olds, who were marginally less likely to 
make relational errors on Distractor versus No Distractor problems, Bonferroni, p < .10, 
presumably because they were selecting the object match distractor more frequently on the 
Distractor problems.) 

To understand these patterns more clearly, a 2 x 2 within-subjects ANOVA with 
Distractor and Complexity was calculated separately for each age group. A significant main 
effect of Complexity was found for both the 5- to 6-year-olds, F(1,24) = 14.31, p < .01, yp2 = 
.374, and 7- to 8-year-olds, F(1,28) = 21.99, p < .001, yp2 = .440, but not for the 9- to 11-year- 
olds, F(1,9) = 2.98, p = .12, yp2 = .249. The younger groups made more relational errors on 2- 
relation versus 1-relation problems, but this difference was not significant for the oldest group. 


No other main effects or interactions were significant. 
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Figure 2. Proportion of correct relational matches on the Scene Analogy task. 


Table 1. Proportion of response types on the Scene Analogy 


5-6 year olds 7-8 year olds 9-11 year olds 
n=25 n=29 n=10 
‘ M 0.82 0.94 0.96 
5 orrect (SD) (0.22) (0.11) (0.10) 
1S) 
§ aeerer M 0.09 0.05 0.03 
& Re ational Error (SD) (0.13) (0.09) (0.08) 
fe) 
Z, M 0.10 0.01 0.02 
5 Other Error (SD) (0.18) (0.05) (0.06) 
AC 
as Correct ea vee it ee 
2 (SD) (0.30) (0.16) (0.13) 
mS te M 0.14 0.06 0.02 
: Featural Error (SD) (0.16) (0.11) (0.06) 
Z| 
@ : M 0.15 0.04 0.02 
Q Relational Error (SD) (0.19) (0.10) (0.06) 
M 0.02 0.01 0.02 
Other Error sD (0.09) (0.04) (0.06) 
Correct iu aes or ae 
8 SD (0.27) (0.16) (0.16) 
1S) 
iss] 
§ : M 0.23 0.16 0.08 
B Relational Error SD (0.21) (0.15) (0.10) 
° 
Z M 0.10 0.02 0.06 
: Other Error SD (0.16) (0.06) (0.14) 
E Correct - en oe ee 
D SD (0.25) (0.20) (0.11) 
nN 
a M 0.18 0.08 0.02 
: Featural Error SD (0.17) (0.15) (0.06) 
Z| 
Z ; M 0.26 0.10 0.06 
Q Relational Error SD (0.20) (0.13) (0.10) 
M 0.03 0.01 0.02 
Other Error SD (0.09) (0.04) (0.06) 
Correct Me es ae Ao 
SD (0.22) (0.11) (0.00) 
= Featural Error i oe oe ae 
F SD (0.06) (0.05) (0.02) 
o 
> 
6 : M 0.18 0.09 0.05 
Relational Error SD (0.12) (0.07) (0.04) 
M 0.06 0.01 0.03 
Other Error SD (0.11) (0.02) (0.04) 
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Executive Functions (Flanker, List Sort, DCCS) 

Due to fatigue, experimenter error, and technological failure, not all participants had data 
for all three EF tasks. Among the 64 children with the outcome measure, 5 did not complete any 
of the EF tasks, and so were excluded from further analyses. Among the 59 remaining children, 
none were missing the DCCS, 13 (22%) were missing the Flanker, and 9 (15%) were missing the 
List Sort Working Memory. Six of the participants with missing data were missing both 
measures. Age in months was related to missing the Flanker (7 = -0.297, p = 0.023) and the List 
Sort Working Memory task (r = -0.317, p = 0.016). 

One participant who was missing only one EF measure (the List Sort) was excluded after 
visual inspection of scatterplots indicated, and correlations confirmed, that their consistently low 
performance across measures could skew the model used to impute missing data (below), as well 
as the relationship between the Scene Analogy task and the EF measures1. 

Missing scores for the fifteen participants who were missing one or two EF measures 
were imputed using multiple imputation, using the predictors: age in months, available scores on 


the remaining EF tasks, race (dummy coded), and maternal education. We include race and 


1 Prior to imputation, the correlation between the Flanker and DCCS tasks was significant when 
this participant’s data were included, r= .35, p < .05, but only marginal when excluded, r = 0.26, 
p < .10. Thus, we excluded this participant from the imputation model to avoid skewing the 
imputed data. Further, proportion correct on the Scene Analogy task was highly correlated with 
pre-imputation Flanker performance when this participant was included, r = .53, p < .0O1, but 
after exclusion, the effect size was much smaller, r= .36, p < .05. Therefore, we decided to also 


remove this participant from all further predictive analyses. 
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maternal education because previous research has shown effects of race and maternal education 
on EF (Little, 2017; Hackman, Gallop, Evans, & Farah, 2015). 

Multiple imputation relies on the assumption that data are missing at random, which is a 
strong assumption to be making given that in our data, missingness is related to age. To 
compensate for this, and to satisfy the recommendation that the number of imputations exceed 
the percentage of missing data (White, Royston, & Wood, 2011), we generated 40 multiply 
imputed datasets, using the built-in functionality in SPSS version 22.0. The final statistics 
reported are the pooled estimates of the coefficients (#, standard error, and corresponding p- 
value), and the average for the F-statistic and R-squared (and corresponding p-value). After 
imputation, fifty-eight participants had complete data with which to analyze the relationships 
between performance on the EF measures and the scene analogy task (Table 2).2 


Table 2. Means and standard deviations on the three EF measures, before and after imputation. 


Before Imputation After Imputation 
DCCS Flanker List Sort DCCS Flanker List Sort 
Possible Range Possible Range 
0-10 0-10 0-26 0-10 0-10 0-26 
© g M/(SD) 3.14 (2.02) 5.22 (1.07) 9.73 (2.84) 3.14 5.15 9.77 
WA 
N 21 12 15 21 21 21 
o e M(SD) 5.52 (1.50) 5.58 (1.30) 14.08 (2.38) 5.52 5.61 14.17 
mm 
N 27 24 25 27 27 27 
= ¢ M(SD) 6.42 (0.57) 6.46 (0.72) 16.50 (2.99) 6.42 6.49 16.50 
a” oN 10 9 10 10 10 10 
g M (SD) 4.81 (2.06) 5.66 (1.20) 13.26 (3.61) 4.81 5.60 12.98 
ia N 58 45 50 58 58 58 


2 Despite the change in sampling due to missing EF data, patterns of performance on the Scene 
Analogy task for the 58 participants analyzed in this part of the results section were 
indistinguishable from those for the 64 participants reported in the first part of this results 


section. 
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To confirm that performance on the EF measures improved with age in our sample, as 
well as explore the relationships between performance on the three tasks, bivariate correlations 
between Age in months, Flanker, List Sort, and DCCS scores were conducted (Table 3). As 
expected, age was significantly correlated with performance on all three EF measures. The List 
Sort and the DCCS were significantly correlated. However, neither task significantly correlated 
with the Flanker. Partial correlations controlling for Age in months suggested that the co- 
linearity between List Sort and DCCS performance could be accounted for by changes in 
performance with age; correlations between the three tasks were not significant when controlling 
for age. 

It is somewhat surprising that we did not find relationships between performance on the 
Flanker and the other two EF measures before controlling for age, or between any of the three 
measures after controlling for age, since measures of different EF constructs often correlate (e.g., 
Blackwell & Munakata, 2014; Carlson, Moses, & Breton, 2002; Koch, Gade, Schuch, & 
Phillipp, 2010). However, the three tasks were designed to measure distinct aspects of the overall 
cognitive resource system, which may differ within an individual (Blackwell, Chatham, 
Wiseheart, & Munakata, 2014). Furthermore, given the strong correlations between age and each 
of the EF measures (e.g., Davidson et al., 2006), it may be difficult to detect variability beyond 


those age-associated changes. 


Table 3. Bivariate and partial correlations between age in months and the EF measures. 


Flanker List Sort DCCS 
Age (months) r 0.42* 0.64** 0.56** 
n 58 58 58 
Flanker r 0.25 0.26 
n 58 58 
List Sort So 0.02 0.46% 
for Age 58 58 
(months) 
DCCS r 0.03 0.16 
n 58 58 
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Non-shaded cells show bivariate correlations; shaded cells show correlations 
controlling for age in months. 


* p< .05 

**  < .O1 

#8 D < 001 
Relations Between Analogical Reasoning and Executive Functions 

To explore whether and how individual differences in executive functions related to 

analogical reasoning performance, two linear regression models using performance on the three 
EF tasks and Age were built to predict performance on key measures within the Scene Analogy 
task. Model 1 included Age only and Model 2 included both EF and Age3 (Table 4). The goal 
was to determine whether including the three EF constructs explained significantly more 
variability on the scene analogy task than age alone. In both models, variables were entered 
simultaneously. 


We summarize two main patterns in advance: (1) the model including both EF and Age 


always provided the best fit for Scene Analogy performance, though it was not statistically better 


3 We also ran a model including EF only. The results were largely the same as the EF and Age 
model, with WM emerging as the only significant predictor of analogy performance. Only one 
scene analogy measure — 2N problem accuracy — was significantly predicted by WM in the EF 


only model but not the EF and Age model, so we report only the EF and Age model here. 


than the Age alone model for all measures, and (2) only the List Sort Working Memory task 


emerged as a significant independent predictor of analogy performance. 
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Table 4. Summary of linear regression models predicting Scene Analogy performance 


Model 1 Model 2 AR Model 1 Model 2 
2 
Age Age + (Model Age Age DCCS Flanker List Sort 
3 EFs I to 2) B B B B B 
R2 R> (SE) (SE) (SE) (SE) (SE) 
Overall: 
0.005*** 0.002 0.000 0.012 0.018* 
ok ok ok KKK N 
peccuracy: | |-0:299 Does ot (0.001) (0.002) (0.012) (0.022) (0.008) 
Featural -0.001** -0.0014 0.001 -0.003 -0.002 
Errors page Oe pate (0.000) (0.001) (0.005) (0.008) (0.003) 
Relational -0.003*** -0.001 0.000 -0.011 -0.011* 
Errors Ss als Ot (0.001) (0.001) (0.007) (0.012) (0.004) 
IN 0.002% 0.000 -0.006 0.013 0.018% 
. 7 7 

Accuracy | 9-067 = 0.191" 0.124 (0.001) (0.002) (0.012) (0.024) (0.008) 
ID 0.006% 0.004 -0.004 0.028 0.008 
Accuracy | 9.211 0.250 002? (0.002) (0.002) (0.016) (0.029) (0.011) 
2N 0.004%* 0.001 0.004 0.010 0.015 
Accuracy | 9-122 Gree “0.0e0) (0.001) (0.002) (0.016) (0.028) (0.011) 
wD ak cass n | 0.007% 0.003 0.006 -0.001 0.029* 
Accuracy pee? pois Oe (0.002) (0.002) (0.017) (0.030) (0.011) 
“p< .10 
* p< 05 
* Dp < 01 


#8 D < 001 


ra 
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Overall response types. The proportion of children’s correct responses, featural errors, 
and relational errors on the scene analogy task were predicted using multiple linear regression 
models with Age in months and scores from the three EF measures — Flanker (inhibitory control 
and attention), List Sort (working memory), and DCCS (cognitive flexibility) — as predictors. 

Both models significantly predicted children’s overall accuracy on the scene analogy task 
(Age only: F(1,56) = 19.538; EF+Age: F(4,53) = 7.685). The EF+Age model (Model 2) was the 
best predictor of children’s overall accuracy on the scene analogy task, accounting for 36.4% of 
the variance. Among the three EF measures, only the List Sort task was a significant individual 
predictor of accuracy. In addition, the EF+Age model was a marginally significant better 
predictor of accuracy than a model including only Age (Model 1). 

Both the EF+Age model (F(4,53) = 2.948) and the Age only model (F(1,56) = 11.124) 
were significant predictors of children’s featural errors. In the EF+Age model, which provided 
the best fit and accounted for 18.2% of the variance, Age was a marginally significant individual 
predictor. The model with both EF and Age was not significantly better than the model with Age 
alone. 

Children’s relational errors were significantly predicted by both models (Age only: 
F(1,56) = 23.015; EF+Age: F(4,53) = 9.408). The best-fitting, EF+tAge model accounted for 
41.3% of the variance. Only the List Sort (working memory) task was a significant individual 
predictor of relational errors in this model. Including both EF and Age significantly improved the 
fit of the model, compared to Age alone. 

Correct responses on different problem types. To better understand how EF relates to 
the distinct demands of different types of problems, the same multiple linear regression models 


with Age in months and scores from the three EF measures were used to predict accuracy on 
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each type of scene analogy problem. These models offer a more detailed breakdown of how EF 
and Age relate to overall accuracy on the scene analogy task. 

For 1-relation, No Distractor problems (1N), the EF+Age model (F(4,53) = 3.169) 
significantly predicted accuracy. The Age only model was only marginally predictive (F(1,56) = 
4.014). The EF+Age model provided the best fit and accounted for 19.1% of the variance on this 
type of problem. Only the List Sort task (working memory) was a significant individual predictor 
of accuracy in this model. The model including both EF and Age was a marginally better 
predictor of 1N performance than a model including only Age. 

For 1-relation, Distractor problems (1D), both models were significant (Age only: 
F(1,56) = 14.992; EF+Age: F(4,53) = 4.442). The EF+Age model accounted for 25.0% of the 
variance on this type of problem and provided the best fit. Age was a marginally significant 
individual predictor in the model. Including both EF and Age in the model did not significantly 
improve the fit compared to a model with only Age. 

For 2-relation, No Distractor problems (2N), both models were again significant (Age 
only: F(1,56) = 7.753; EF+Age: F(4,53) = 2.962). Accounting for 18.2% of the variance, the 
EF+Age model was the best-fitting model of 2D accuracy. However, neither age nor any of the 
EF measures were significant (or marginal) individual predictors in this model. Including both 
EF and Age in the model did not explain more of the variance compared to an Age only model. 

Finally, for 2-relation, Distractor problems (2D), the Age only (F(1,56) = 20.228) and 
EF+Age models (F(4,53) = 8.537) both significantly predicted accuracy. The model with both 
EF and Age provided the best fit, accounting for 38.9% of the variance on this problem type. 


Performance on the List Sort (working memory) task was a significant individual predictor of 
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accuracy in the EF-+Age model. Including Age and EF measures marginally improved the 
model’s fit over a model with just Age. 

Replication with a separate dataset. As part of a different study on long-term memory 
and analogical reasoning, an additional 24 children (15 girls) between the ages of 5 and 6 (Mage = 
6.03, SDage = .61, Range 5.05-6.95 years) completed the Flanker Inhibitory Control task, the List 
Sort Working Memory task, and the Scene Analogy task, using the same methods outlined in this 
paper. To test the replicability of our findings, we ran two linear regression models to predict 
performance on the Scene Analogy task: one with age in months (Age alone) and a second with 
age and the scores on the two EF tasks entered simultaneously (EF+Age). Overall, we replicated 
our main patterns of results with these participants. Specifically, working memory predicted 
performance on the scene analogy task, even when controlling for age and inhibitory control. In 
addition, including the EF measures significantly improved the fit of the models, over the models 
using age alone as a predictor. 

Discussion 

In this study, we examined the relationships between age, individual differences in 
executive function (EF), and 5- to 11-year-olds’ analogical mapping performance. As prior work 
has found, age was highly predictive of children’s success on the analogical reasoning task. For 
the first time, we also showed that children’s working memory capacity—an important 
component of executive function—also predicts analogy performance. Together, age and 
performance on the EF measures provided the most accurate predictions of children’s analogical 
reasoning. Even after controlling for age, which is highly related to EF development, working 
memory (WM) remained a significant predictor of performance, suggesting that this relationship 


was not due to other age-related changes that may be involved in analogical development. In 
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addition, we were able to replicate these general patterns in a separate dataset of 5- to 6-year- 
olds, providing further assurance that the relationship between working memory and analogy 
performance was genuine. These findings illuminate the mechanisms underlying analogical 
development by (1) demonstrating a within-individual relationship between WM capacity and 
analogical reasoning skill, and (2) providing specificity about what circumstances and behaviors 
are most related to individual capacity. 

We further considered the speculation that working memory and inhibitory control might 
predict distinct aspects of analogical reasoning behavior. In particular, we conjectured that WM 
capacity would be especially related to problems with multiple relations and to errors involving 
relational integration (see Halford, 1993), whereas inhibitory control would be more highly 
related to problems with competing object matches and to errors involving featural distraction 
(Richland et al., 2006). We found that working memory was related to children’s reasoning more 
broadly, at both high and low levels of relational complexity, suggesting that individual 
differences in working memory may be relevant even when representational demands are not 
being strained. We did not find evidence in this study that inhibitory control (or cognitive 
flexibility) were significant individual predictors of children’s analogical performance, though 
this could have been explained by the nature of the specific tasks that were administered. 

The observed relationship between working memory and analogical performance is 
consistent with findings from the adult literature. In adults, when working memory resources are 
compromised, both the ability to integrate multiple relations and to resist featural distraction on 
analogy tasks suffers (Waltz et al., 1999, 2000; Morrison et al., 2004). Interactions between 


increasing representational and inhibitory demands on performance indicate that in the context of 
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analogical reasoning, WM capacity and inhibitory processes in WM share a common resource 
(Cho et al., 2007), and thus it may be difficult to disentangle the distinct contributions from each. 

In adults, it is the inhibitory processes in working memory that have been shown to be 
predictive of analogical reasoning. One explanation for why we did not find that inhibitory 
control was related to Scene Analogy performance in this study is that our measure of IC, the 
Flanker task, does not need to be done in working memory and therefore may not measure the 
type of inhibition recruited in analogical reasoning (or the important interaction between working 
memory and inhibition, e.g., Cho et al., 2007). It’s possible that a task requiring inhibitory 
control in WM (for example, the Dots task used in Davidson et al., 2006, which requires children 
to remember a set of response rules as well as inhibit prepotent responses) would be more related 
to children’s analogical performance. 

The NIH Flanker also showed only moderate convergent validity with another measure of 
inhibitory control, the D-KEFS Color Word Interference test (Delis, Kaplan, & Kramer, 2001), 
in 8- to 15-year-olds. D-KEFS requires children to ignore a written color word in order to name 
the (incongruent) color of the text instead. It is conceptually quite similar to the Flanker task, in 
that children must selectively attend to certain aspects of the stimulus and override a conflicting, 
prepotent response associated with the unattended information. However, the conflicting 
information on the Flanker task is spatially separated — narrowing the focus of attention to the 
center of the screen could bypass attention to the conflicting information, reducing the need for 
inhibitory control. In contrast, children on the D-KEFS must attend to the same spatial location, 
and thus cannot avert the conflict resolution requirements. Spatially separating conflicting 
information, so that attention to the conflicting information can be avoided, has been shown to 


benefit children’s executive control performance (Chevalier, Blaye, Dufau, & Lucenet, 2010); 
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likewise, featural distraction only creates decrements in adults’ performance when that 
information has been attended to (Cho et al., 2007). Any single task introduces unique, task- 
specific variance that can mask the underlying construct and skew its relationship to other tasks 
and capacities, or obscure important interactions between the construct and its context. Thus, it 
may be that the NIH Flanker was not an ideal measure of inhibitory control for this study. 

Another possibility is that inhibitory control would be associated with avoiding featural 
distraction for younger children, but not for children in the age range tested here. In fact, only the 
youngest age group in our study (5- to 6-year-olds) showed a decrement in performance on 
Distractor problems, and many of the demonstrations of the object bias in analogical 
development have been conducted with younger, preschool-aged children (e.g., Christie & 
Gentner, 2010; Richland et al., 2006; Thibaut et al., 2010b). Substantial gains in inhibitory 
control also happen during the preschool years, just younger than most of the children in our 
study (Davidson et al., 2006), and inhibitory control has been shown to be predictive of problem 
solving earlier, rather than later, in children’s development (Blakey et al., 2016; Senn et al., 
2004). Potentially, gains in other capacities, such as working memory, alleviate demands on 
inhibitory control (Engle, 2002), so that for children of the ages in this study, individual 
variability in inhibitory control capacity is no longer strongly predictive of analogical 
performance. 

Regardless, our findings do not support the alternative hypothesis that age, but not 
components of EF, are related to children’s analogical reasoning performance. This result would 
have been expected if other age-related changes accounted wholly for developmental 
improvements in analogical reasoning. In that case, after controlling for age (and 


correspondingly, other age-linked factors related to analogical performance such as knowledge 
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accretion), there should not have been any additional variance predicted by the EF measures. 
Instead, we found that differences in working memory were related to individual differences in 
analogical reasoning beyond effects of age. Of course, our results are not sufficient to conclude 
that increases in working memory themselves cause improvements in analogical reasoning over 
development, but they are consistent with a view supported by prior developmental and adult 
research that working memory — and possibly other executive control resources, though we did 
not find evidence for that in our study — provide part of the foundation on which analogical 
reasoning ability is built. 

We consider this research an important initial exploratory step towards understanding 
how these skills co-develop within individual children. Future work will need to include a 
greater variety of EF and analogy tasks to provide more robust measurement of these skills. 
Future work should also strive to look more broadly at a full range of development. As we have 
already alluded to, different aspects of EF may play more or less prominent roles in analogical 
reasoning as children’s capacities develop. 

Nonetheless, this work highlights the necessity of seriously considering the constraints 
imposed by the structure and limits of EF resources like working memory, and when and how 
changes in these resources influence analogical development. A comprehensive account of 
analogical development will need to specify how working memory and other contributors to 
executive function interact with other factors involved in development, including relational 
knowledge, relational language, strategy use, patterns of attention, and so on. For example, 
relational language may allow children to construct efficient, robust representations in working 
memory, reducing load and helping children direct attention (Morrison & Cho, 2008; Vales & 


Smith, 2015). 


35 


From an applied perspective, understanding how executive function resources limit or 
support analogical reasoning has implications for educational practice, by helping curriculum 
designers and teachers create and implement more effective instruction that takes cognitive 
demands and student capacity into account (Begolli, Richland, & Jaeggi, 2015; Richland & 
McDonough, 2010; Richland & Simms, 2015). For example, making the representations of to- 
be-compared analogues visible simultaneously during analogical instruction can alleviate 
working memory demands, freeing those resources for forging the connections that make 
analogical learning so powerful (Begolli et al., 2015). 
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